Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream alphagov updates #11

Open
wants to merge 289 commits into
base: main
Choose a base branch
from
Open

Conversation

frankmolenaar1986
Copy link

upstream alphagov updates

quis and others added 30 commits June 3, 2024 13:59
Add a command to copy `pyproject.toml` into apps
This is better than looking at `requirements.in` because:
- `requirements.in` isn’t always present (for example when running apps
  in Docker)
- `requirements.txt` is more likely to reflect what’s actually installed
  (since that’s what we tell `pip` to look at)
Look at `requirements.txt` to determine installed version
NotifyTask: include pid and other structured fields in completion logs
For context, when the admin app wants to get data from our API
it will first try and get the data from redis and if the data
doesn't exist in redis, it will get the data from the API and then
finish by setting that data in redis. When the admin app wants to
update data for our API, it currently calls out to the API to
update the data and then deletes any existing/relevant data cached
in redis. The subsequent time it tries to get this data that it just
updated, it will use the usual approach for getting data that will
set the redis key too.

There is a problem with when the admin app wants to update something
in the database. At the moment, it will start by calling the API
to update the database and if that is successful it will then attempt
to delete the relevant cache keys in redis. But redis may not always
be available and in that case, it will
- fail to delete the cache key
- catch and ignore the exception raised when trying to delete the
  redis cache key
This leads to the change the user requested being made in the
database, but the cache still has the old data in it! This is bad
because our apps check the cache first and this can result in us
sending out incorrect emails using old templates, for example. We
have a runbook to manually recover from this position if redis
has downtime and delete queries fail during this time:
https://github.com/alphagov/notifications-manuals/wiki/Support-Runbook#deal-with-redis-outages

Note, there is no issue with `cache.set` calls. If the call to redis
fails as part of this, no stale data is produced. The database hasn't
been changed and either redis hasn't been changed or hasn't had
the current data added to it.

This commit changes our caching logic so that we can't end up with
stale data in redis if a delete to redis fails. If a delete to redis
fails, then we error early and don't attempt to update the database.
The trade off we make here is that now a user will see an error
page if their request failed to delete from redis, whereas before
they would have gotten a 200 (but ended up with stale data). We
think it is worse to have stale data, then it is to fail a
users request.
In the previous commit, we tried to avoid stale data in redis if
redis is unavailable. However, this introduced a race condition
where another request re-populates the redis key with the old value
before the database has been updated. We fix this by also deleting
the cache key after the API update so that if that happened, we would
remove the stale data after the API update.

There is a small edge case that could still end up with stale data
that this commit doesn't solve (and we may just have to tolerate).
If we hit the race condition above so stale data is reinserted in
redis before the database update, then the database update happens
and redis goes down at this point. Then what will happen is the
second cache clear will fail and we will be left with stale data.
This should be a rare case as involves both a race condition happening
so two requests for the same object at very similar times, and redis
going down mid request.

Again, we lean towards throwing an uncaught exception if a redis
delete fails. It means there is a chance of stale data and the
user should hopefully check if their action was successful and
retry if they see stale data in the user interface (which should
then hopefully fix the problem).
Ensure no stale data in redis if redis errors when deleting cache key
first in a series of commits to move phone number and email validation
in to one centralised location too
also have a new shared base class so we don't have phone number errors
extending from email address errors. This was raised six (!) years ago
but never sorted out[^1].

[^1]: https://github.com/alphagov/notifications-api/pull/1590/files#diff-463068b4d0340d64d95bb9d1e816b1d4da643fd30341c637f2745fff3609559bR46-R47
note that i moved the email-specific regex constants out of
notifications_utils/__init__.py and into the email_address file itself.
these are only used by the email_address code itself and in one place
in document download, and now that they're in a smaller file they seem
reasonable to place here
mostly so that in places that use recipient.py, we'll be forced to
update the imports so that they're using the new paths to keep
everywhere up to date
Refactor phone/email/letter validation
The postcode zone is the first 1, 2 or 3 letters of the postcode. There
are about 120 postcode zones defined in the UK.

At the moment we allow postcodes to start with any 1 or 2 letters. This
means we allow plenty of postcode which aren’t real, for example
`NF1 1AA`.

By restricting the first n letters of the postcode to only valid
postcode zones we can reduce the number of letters which DVLA manually
flag to us as having invalid addresses.

***

List taken from: http://www.ons.gov.uk/ons/guide-method/geography/products/postcode-directories/-nspp-/onspd-user-guide-and-version-notes.zip
This saves us clogging up `postal_address.py` with a long list
`GX` is not a valid UK postcode zone so we will reject it anyway, without
having to treat it as a special case.
Restrict postcodes to valid UK postcode zones
Many libraries we're only using much more recent versions than utils has
pinned anyway (like flask or pypdf), so it's easy to bring them up to
the oldest version in use without any risk.

Others have outstanding CVEs (like jinja) so it's good to make sure we
always have secure versions.

Others like boto3 we've already assessed the changelogs elsewhere[^1]

Many major changes were just dropping support for old versions of python

Some like cachetools needed a bit of checking as they changed import
structure (but we're not affected in this instance)

[^1]: alphagov/notifications-api#4082
Pyupgrade[1] automatically upgrades syntax for newer versions of Python.
Newer syntax is often friendlier and less verbose – for example f-string
formatting rather than calls to `str.format`.

It has a total of 42 rules[2].

This commit adds it to our linter config, and then fixes all the
instances of old syntax it found (the vaast majority of these were
automatic by running Ruff with the `--fix` flag).

1. https://pypi.org/project/pyupgrade/
2. https://docs.astral.sh/ruff/rules/#pyupgrade-up
leohemsted and others added 30 commits December 5, 2024 17:26
You can bypass newline truncation[1] by sending the dreaded unicode
‘hangul filler’ character in my personalisations.

This could be utilised by a dodgy character who wants to hide some
portion of the email content (if, say, they've utilised template
injection to create a phishing email)

***

1. Defined at https://github.com/alphagov/notifications-utils/blob/88c57d5eb530b88794814e57249c2403227aa664/notifications_utils/markdown.py#L222
Remove hangul filler character from emails
ruff doesn’t read `setup.cfg` so the rules defined there are redundant.
This is a hangover from when we used to use `flake8` directly.

I’ve copied some of the rules into pyproject.toml

Selected rules:
- B902 and implemented in ruff as N804 and N805. Only copying N804
  because N805 gets tripped up by sqlalchemy/sqlalchemy#9213
- E203 still in preview in ruff, add a note to include it later

Ignored rules:
- W503 and W504 aren’t likely to be implemented in ruff
  astral-sh/ruff#4125
- E203 is still in preview https://docs.astral.sh/ruff/rules/whitespace-before-punctuation/
  so we don’t need to worry about ignoring it (and maybe it will be fit
  for purpose once it’s out of preview)

Cyclomatic complexity:
- this repo has it set to 8, but everywhere else we use the default (10)
  so let’s not copy that into `pyproject.toml`
Remove flake8 config from `setup.cfg`
…-validation"

Temporarily revert the removal of the old phone number validation code while decisions over how to safely incorporate the changes into api and admin are made

This reverts commit 21700cc, reversing
changes made to febec8e.
…refixes first.

When casting the list of keys to a(n unordered) the country code becomes set non-deterministically. This change instead turns the set into an ordered list, ordered with the longest prefixes first to avoid this issue
…validation

Revert "Merge pull request #1158 from alphagov/remove-old-phonenumber validation"
Bump core dependencies to latest versions
We can’t upgrade all apps until psf/requests#6730
is fixed (it stops the API from talking to DVLA)
Downgrade minimum version of requests to 2.32.3
anything else is asking for trouble
returning a CacheResultWrapper from a RequestCache.set-wrapped
function allows the function to contextually decide whether
the result should be cached

internally this uses functools.singledispatch to detect
CacheResultWrapper instances, so should allow further
customization if the design of CacheResultWrapper isn't
convenient for some situation
…pper

`RequestCache`: add `CacheResultWrapper` to allow dynamic cache decisions
This fixes a couple of moderate security vulnerabilities.
Bump minimum jinja2 version to latest
these functions can be used to collect metrics on the greenlet
switching that eventlet is performing behind the scenes and
debug situations where we're seeing suboptimal behaviour

not recommended to be enabled permanently as there is probably
a minor performance penalty to accounting these times
this allows logging.formatting to be imported without implicitly
importing all of these mostly-flask-related logging routines

this is important because we want to be able to use
logging.formatting from the gunicorn config which executes before
eventlet's monkeypatching has been performed, and implicitly
importing flask before monkeypatching adds to the spew of errors
emitted on app startup.
leaving scope for the thread_time function returning None as the
greenlet variant will if the appropriate trace function isn't
installed
eventlet: add greenlet time accounting and metric logging tools
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants